This study aims to analyze Spains’ energy generation during 2015 using two comprehensive and extensive datasets. One dataset contains hourly weather information such as wind speeds and temperature from five of Spains’ cities. The other dataset has the hourly energy generation from various energy sources like nuclear-, solar- and wind energy.
We can gather information from this dataset to gain insights regarding Spains’ energy generation and the viability of the different energy sources. By making valuable visualizations of the data we can easily compare and examine the varying levels of efficiency and consistency within the different energy production methods.
Comparing and evaluating these methods is of great importance in a world that faces the challenges of accelerated global warming every day. Governing entities such as Spain need to expect that combating the climate crisis will entail moving away from fossil fuels while moving toward sustainable energy sources. The findings of this study hold the potential to inform policymakers and guide them toward the best investments for the future with respect to generating energy and conserving energy security.
We will provide two data stories centering around wind energy, one of the most promising sustainable energy sources.
A story about the consistency of wind energy, emphasizing the energy source’s predictability, reliability, its established contribution, and the influence of wind velocities on the energy price.
A story about the drawbacks of wind energy, highlighting the lack of consistency and the way other energy sources compensate for this within Spain.
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
print('testetst')
testetst
The two datasets consist of hourly data on energy generation and weather features from five cities in Spain. To facilitate their usability, the datasets have been merged based on the date, with the weather features averaged. Irrelevant columns have been dropped to streamline the data. Since weather patterns follow an annual cycle, only the first 8,760 rows, corresponding to the first year, have been utilized in the analysis.
energy_df = pd.read_csv('energy_dataset.csv').drop(['generation hydro pumped storage aggregated', 'generation marine', 'forecast wind offshore eday ahead'], axis=1)
energy_df.head()
| time | generation biomass | generation fossil brown coal/lignite | generation fossil coal-derived gas | generation fossil gas | generation fossil hard coal | generation fossil oil | generation fossil oil shale | generation fossil peat | generation geothermal | ... | generation solar | generation waste | generation wind offshore | generation wind onshore | forecast solar day ahead | forecast wind onshore day ahead | total load forecast | total load actual | price day ahead | price actual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 00:00:00+01:00 | 447.0 | 329.0 | 0.0 | 4844.0 | 4821.0 | 162.0 | 0.0 | 0.0 | 0.0 | ... | 49.0 | 196.0 | 0.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 |
| 1 | 2015-01-01 01:00:00+01:00 | 449.0 | 328.0 | 0.0 | 5196.0 | 4755.0 | 158.0 | 0.0 | 0.0 | 0.0 | ... | 50.0 | 195.0 | 0.0 | 5890.0 | 16.0 | 5856.0 | 24934.0 | 24382.0 | 48.10 | 64.92 |
| 2 | 2015-01-01 02:00:00+01:00 | 448.0 | 323.0 | 0.0 | 4857.0 | 4581.0 | 157.0 | 0.0 | 0.0 | 0.0 | ... | 50.0 | 196.0 | 0.0 | 5461.0 | 8.0 | 5454.0 | 23515.0 | 22734.0 | 47.33 | 64.48 |
| 3 | 2015-01-01 03:00:00+01:00 | 438.0 | 254.0 | 0.0 | 4314.0 | 4131.0 | 160.0 | 0.0 | 0.0 | 0.0 | ... | 50.0 | 191.0 | 0.0 | 5238.0 | 2.0 | 5151.0 | 22642.0 | 21286.0 | 42.27 | 59.32 |
| 4 | 2015-01-01 04:00:00+01:00 | 428.0 | 187.0 | 0.0 | 4130.0 | 3840.0 | 156.0 | 0.0 | 0.0 | 0.0 | ... | 42.0 | 189.0 | 0.0 | 4935.0 | 9.0 | 4861.0 | 21785.0 | 20264.0 | 38.41 | 56.04 |
5 rows × 26 columns
li = range(13)
weather_df = pd.read_csv('weather_features.csv', usecols=li).drop('city_name', axis=1)
weather_df = weather_df.groupby('time').mean()
weather_df.head()
| temp | temp_min | temp_max | pressure | humidity | wind_speed | wind_deg | rain_1h | rain_3h | snow_3h | clouds_all | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| time | |||||||||||
| 2015-01-01 00:00:00+01:00 | 272.491463 | 272.491463 | 272.491463 | 1016.4 | 82.4 | 2.0 | 135.2 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2015-01-01 01:00:00+01:00 | 272.512700 | 272.512700 | 272.512700 | 1016.2 | 82.4 | 2.0 | 135.8 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2015-01-01 02:00:00+01:00 | 272.099137 | 272.099137 | 272.099137 | 1016.8 | 82.0 | 2.4 | 119.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2015-01-01 03:00:00+01:00 | 272.089469 | 272.089469 | 272.089469 | 1016.6 | 82.0 | 2.4 | 119.2 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2015-01-01 04:00:00+01:00 | 272.145900 | 272.145900 | 272.145900 | 1016.6 | 82.0 | 2.4 | 118.4 | 0.0 | 0.0 | 0.0 | 0.0 |
df = weather_df.merge(energy_df, on='time')[:8760]
times = []
for time in df['time']:
good, bad = time.split('+')
times.append(good)
df['time']= times
df.head()
| time | temp | temp_min | temp_max | pressure | humidity | wind_speed | wind_deg | rain_1h | rain_3h | ... | generation solar | generation waste | generation wind offshore | generation wind onshore | forecast solar day ahead | forecast wind onshore day ahead | total load forecast | total load actual | price day ahead | price actual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 00:00:00 | 272.491463 | 272.491463 | 272.491463 | 1016.4 | 82.4 | 2.0 | 135.2 | 0.0 | 0.0 | ... | 49.0 | 196.0 | 0.0 | 6378.0 | 17.0 | 6436.0 | 26118.0 | 25385.0 | 50.10 | 65.41 |
| 1 | 2015-01-01 01:00:00 | 272.512700 | 272.512700 | 272.512700 | 1016.2 | 82.4 | 2.0 | 135.8 | 0.0 | 0.0 | ... | 50.0 | 195.0 | 0.0 | 5890.0 | 16.0 | 5856.0 | 24934.0 | 24382.0 | 48.10 | 64.92 |
| 2 | 2015-01-01 02:00:00 | 272.099137 | 272.099137 | 272.099137 | 1016.8 | 82.0 | 2.4 | 119.0 | 0.0 | 0.0 | ... | 50.0 | 196.0 | 0.0 | 5461.0 | 8.0 | 5454.0 | 23515.0 | 22734.0 | 47.33 | 64.48 |
| 3 | 2015-01-01 03:00:00 | 272.089469 | 272.089469 | 272.089469 | 1016.6 | 82.0 | 2.4 | 119.2 | 0.0 | 0.0 | ... | 50.0 | 191.0 | 0.0 | 5238.0 | 2.0 | 5151.0 | 22642.0 | 21286.0 | 42.27 | 59.32 |
| 4 | 2015-01-01 04:00:00 | 272.145900 | 272.145900 | 272.145900 | 1016.6 | 82.0 | 2.4 | 118.4 | 0.0 | 0.0 | ... | 42.0 | 189.0 | 0.0 | 4935.0 | 9.0 | 4861.0 | 21785.0 | 20264.0 | 38.41 | 56.04 |
5 rows × 37 columns
sum_biomass = df['generation biomass'].sum()
sum_coal = df['generation fossil brown coal/lignite'].sum()
sum_gas = df['generation fossil gas'].sum()
sum_hard_coal = df['generation fossil hard coal'].sum()
sum_oil = df['generation fossil oil'].sum()
sum_hydro = df['generation hydro pumped storage consumption'].sum() + df['generation hydro run-of-river and poundage'].sum() + df['generation hydro water reservoir'].sum()
sum_nuclear = df['generation nuclear'].sum()
sum_other = df['generation other'].sum() + df['generation other renewable'].sum() + df['generation waste'].sum() + df['generation fossil oil'].sum() + df['generation fossil brown coal/lignite'].sum() + df['generation biomass'].sum()
sum_solar = df['generation solar'].sum()
sum_waste = df['generation waste'].sum()
sum_wind = df['generation wind onshore'].sum()
labels = ['Gas', 'Hard coal', 'Hydro', 'Nuclear', 'Other', 'Solar', 'Wind']
values = [sum_gas, sum_hard_coal, sum_hydro, sum_nuclear, sum_other, sum_solar, sum_wind]
layout = go.Layout(
height=600,
title='Overview of energy generated in Spain, 2015',
showlegend=False,
hovermode=False,
)
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.8, textinfo='label + percent', marker=dict(colors=px.colors.qualitative.T10), textposition='outside')], layout=layout)
fig.show()
This plot gives an overview of the percentual generated energy distribution in spain in 2015.
Reflection
Work distribution
A story about the benefits of wind energy
# Pro
color_wind = 'blue'
color_solar = 'orange'
color_hydro = 'pink'
fig = go.Figure()
fig.add_trace(go.Violin(y=df['generation wind onshore'], meanline=dict(color=color_wind), name='Wind energy generated', marker_color=color_wind))
fig.add_trace(go.Violin(y=df['generation solar']*df['generation wind onshore'].mean()/df['generation solar'].mean(), meanline=dict(color=color_solar), name='Solar energy generated', marker_color=color_solar))
fig.add_trace(go.Violin(y= df['generation hydro water reservoir'] * df['generation wind onshore'].mean() / df['generation hydro water reservoir'].mean(), name='Hydro energy generated', meanline=dict(color=color_hydro), marker_color=color_hydro))
fig.update_layout(hovermode=False, showlegend=False, title='Distribution of wind and solar energy scaled to the same average',
xaxis=go.layout.XAxis(
title='Different sustainable energy generation methods'
),
yaxis=go.layout.YAxis(
title='Energy generated (MW)'
)
)
fig.show()
This violin-plot examines three sustainable energy methods represented on the x-axis, while the y-axis represents the distribution of generated energy in Mega-Watts. The energy sources have been standardized to possess the same mean, allowing for a comparison of how each energy source is distributed around the mean line. The distribution of wind energy appears highly compact in contrast to solar energy, which exhibits significant fluctuations. Similarly, the distribution of hydro energy resembles that of wind energy, albeit with a higher peak and a broader base.
These findings highlight that among the three sustainable energy generation methods analyzed, wind energy demonstrates the highest consistency.
# Pro
smoothed_values_wind = df['generation wind onshore'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_solar = df['generation solar'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_hydro = df['generation hydro water reservoir'].rolling(365, min_periods=1, center=True).mean()
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_solar, name='Solar', mode='lines', stackgroup='one', line=dict(color='orange')))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_hydro, name='Hydro', mode='lines', stackgroup='one', line=dict(color='pink')))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_wind, name='Wind', mode='lines', stackgroup='one', line=dict(color='blue')))
fig.update_layout(title='Energy contribution of wind, hydro and solar', hovermode=False, xaxis=go.layout.XAxis(title='Date'), yaxis=go.layout.YAxis(title='Energy generated (MW)'))
fig.show()
In the previous graph (Distribution of wind and solar energy scaled to the same average), wind energy demonstrated the highest level of consistency compared to hydro and solar energy. In the stacked area chart above, the cumulative energy contribution of the three energy sources is visualized. The x-axis represents the dates with a two-month interval, while the y-axis displays the generated energy contribution in Mega-watts. The area representing wind energy is notably the largest, indicating that wind energy is the most significant contributor to sustainable energy production.
# Pro
fig = go.Figure()
# Scatter plot for solar energy
fig.add_trace(go.Scatter(
x=df['forecast solar day ahead'],
y=df['generation solar'],
mode='markers',
name='Solar Energy',
marker=dict(color='orange')
))
fig.add_trace(go.Scatter(
x=df['forecast wind onshore day ahead'],
y=df['generation wind onshore'],
mode='markers',
visible=False,
name='Wind Energy',
marker=dict(color='blue')
))
fig.update_layout(xaxis=go.layout.XAxis(title='Forecast energy day ahead (MW)'), yaxis=go.layout.YAxis(title='Energy generated (MW)'),
updatemenus=[
dict(
active=0,
buttons=list([
dict(label="Solar",
method="update",
args=[{"visible": [True, False]},
{"title": "Yahoo"}]),
dict(label="Wind",
method="update",
args=[{"visible": [False, True]},
{"title": ""}]),
dict(label="Both",
method="update",
args=[{"visible": [True, True]},
{"title": "Both"}]),
]),
)
])
fig.show()
The three energy sources mentioned earlier are weather-dependent. The dataset includes predictions of the amount of energy generated from wind and solar sources one day in advance. A scatter plot has been created to depict the correlation between the predicted and actual energy values. The x-axis represents the forecasted energy, while the y-axis represents the actual generated energy, both measured in Mega-watts. By utilizing the dropdown menu, one can view charts specifically for solar energy, wind energy, or a combination of both.
In the scatter plot, the wind energy data points form a linear pattern with only a few outliers, suggesting a strong correlation between the forecasted and actual generated wind energy. On the other hand, the solar energy data points form a more scattered distribution. The correlation between the forecasted and actual generated wind energy is stronger compared to that of solar energy. This means that wind energy is more predictable.
# Pro
df['date'] = pd.to_datetime(df['time'])
fig = px.scatter(df, x=df['wind_speed'], y=df['generation wind onshore'], color=df['price actual'], opacity=1, animation_frame=df['date'].dt.month_name())
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 3000
fig.update_layout(title='Monthly comparison of the effect of wind speed on generated wind energy and energy price', xaxis_title="Wind speed (M/S)", yaxis_title='Wind energy generated (MW)')
fig.show()
The interactive scatterplot presented here illustrates the relationship between wind speed, generated wind energy, and energy prices. The x-axis represents wind speed in meters per second, while the y-axis displays the generated wind energy in Mega-watts. Additionally, the color of the dots indicates the corresponding energy prices at each moment. By utilizing the slider at the bottom, one can navigate through the months of 2015. In most months, there is a slight positive correlation between wind speed and generated wind energy, as well as a positive correlation between wind speed and energy prices. However, an interesting reversal occurs during the months of June and July, where both correlations are inverted.
These findings suggest that wind energy is generated even with low wind speeds, this refutes the common conception that there is no wind energy generation with low wind speeds. This showcases the reliability of wind generated energy. In addition, it is observed that higher wind velocities are associated with a significant reduction in overall energy prices.
Cons
# Con
color_wind = '#800080'
color_nuclear = '#FFFF00'
fig = go.Figure()
fig.add_trace(go.Box(y=df['generation wind onshore'], name='Wind energy generated', marker_color=color_wind))
fig.add_trace(go.Box(y=df['generation nuclear']*df['generation wind onshore'].mean()/df['generation nuclear'].mean(), name='Nuclear energy generated', marker_color=color_nuclear))
fig.update_layout(hovermode=False, showlegend=False, title='Distribution of wind and nuclear energy scaled to the same average', xaxis=go.layout.XAxis(
title='Different energy generation methods'
),
yaxis=go.layout.YAxis(
title='Energy generated (MW)'
))
The box plot presented provides insights into the consistency of two energy generation methods. On the x-axis, the two methods are represented, while the y-axis displays their respective generated energy in Mega-watts.
The box plot for wind energy shows a wide spread, accompanied by outliers at the upper end. Conversely, the box representing nuclear energy is compact, with a small interquartile range. This stark contrast indicates a notably higher level of consistency for nuclear energy when compared to wind energy. Thus, wind energy is characterized as being inconsistent.
# Con
color_wind = '#800080'
color_coal = '#FFFF00'
values_wind = df['generation wind onshore']
total = df['total load actual']
values_coal = df['generation fossil hard coal']
values_nuclear = df['generation nuclear']
wind_percentages = []
coal_percentages = []
nuclear_percentages = []
for i in range(len(values_coal)):
if 0<values_wind[i]/total[i]<1:
wind_percentages.append(values_wind[i]/total[i])
else:
wind_percentages.append(0)
if 0<values_coal[i]/total[i]<1:
coal_percentages.append(values_coal[i]/total[i])
else:
coal_percentages.append(0)
if 0<values_nuclear[i]/total[i]<1:
nuclear_percentages.append(values_nuclear[i]/total[i])
else:
nuclear_percentages.append(0)
df_2= pd.DataFrame(data =[wind_percentages, coal_percentages, nuclear_percentages]).T
df_2.columns=['wind', 'coal', 'nuclear']
smoothed_values_wind = df_2['wind'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_coal = df_2['coal'].rolling(365, min_periods=1, center=True).mean()
smoothed_values_nuclear = df_2['nuclear'].rolling(365, min_periods=1, center=True).mean()
fig = go.Figure()
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_wind, name='Percentage wind energy', line=dict(color=color_wind)))
fig.add_trace(go.Scatter(x=df['time'], y=smoothed_values_coal, name='Percentage coal energy', line=dict(color=color_coal)))
fig.update_layout(title='Percentage contribution of total energy used', hovermode=False, xaxis=go.layout.XAxis(
title='Dates'
),
yaxis=go.layout.YAxis(
title='Percentage of total energy contribution',
tickformat=',.0%'
))
fig.show()
# fig = px.line(x=df['time'], y=smoothed_values, title='Life expectancy in Canada')
# fig.show()
This line plot illustrates the percentage contribution of wind and coal energy to the total energy generated throughout 2015. The x-axis represents the dates, while the y-axis displays the respective percentages of their contribution to the total energy generation.
Observing the graph, the lines representing wind and coal energy exhibit almost horizontal mirroring. indicating wind energy is not a reliable energy source. This lack of consistency is compensated by burning coal. Increasing Spains’ usage of unsustainable and polluting energy sources.
Appendix: ChatGPT for betterment of english coherence and finding synonyms.